Malay language modeling in large vocabulary continuous speech recognition with linguistic information

نویسندگان

  • Kai Sze Hong
  • Tien Ping Tan
  • Enya Kong Tang
  • Yu-N Cheah
چکیده

In this paper, our recent progress in developing and evaluating Malay Large Vocabulary Continuous Speech Recognizer (LVCSR) with considerations of linguistic information is discussed. The best baseline system has a WER of 15.8%. In order to propose methods to improve the accuracies further, additional experiments have been performed using linguistic information such as part-ofspeech and stem. We have also tested our system by creating a language model using a small amount of texts and suggested that linguistic knowledge can be used to improve the accuracy of Malay automatic speech recognition system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

A Scalable System for Embedded Large Vocabulary Continuous Speech Recognition

This paper presents a system for large vocabulary continuous speech recognition in condition of constrained hardware resources. We investigate efficient pruning and caching strategy aiming to handle extensive acoustic and linguistic modeling. Software components are analyzed in terms of resource consuming. Then, we evaluate the system performance in extreme configuration where acoustic and ling...

متن کامل

Context-dependent Phone Mapping for Acoustic Modeling of Under-resourced Languages

This paper presents the use of phone mapping for acoustic modeling of a language with limited training data. In this approach, we use well-trained acoustic models of a source language to generate acoustic scores for each feature vector of the target language. These scores are then mapped to the posteriors of context-dependent triphones of the target language using a limited amount of training d...

متن کامل

Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary but limited training data

This correspondence presents the first known results of complete recognition of continuous Mandarin speech for the Chinese language with very large vocabulary but very limited training data. Various acoustic and linguistic processing techniques were developed, and a prototype system of a continuous speech Mandarin dictation machine has been successfully implemented. The best recognition accurac...

متن کامل

Characteristics of Chinese language models for large vocabulary telephone speech

This paper is concerned with language modeling (LM) for large vocabulary speech recognition in Mandarin Chinese. As the language characteristics of Chinese are quite unique, we investigate some novel techniques in language modeling. We also borrow some of techniques that have been applied to other languages. Experiments have been conducted on the Call Home Mandarin, HUB4, and HUB5 corpora obtai...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010